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Abstract 

In this paper, we propose a stratified sampling algorithm in which the 
random drawings made in the strata to compute the expectation of interest 
are also used to adaptively modify the proportion of further drawings in 
each stratum. These proportions converge to the optimal allocation in 
terms of variance reduction. And our stratified estimator is asymptotically 
normal with asymptotic variance equal to the minimal one. Numerical 
experiments confirm the efficiency of our algorithm. 

Introduction 

Let X be a Revalued random variable and / : M. d — > K a measurable function 
such that E(/ 2 (X)) < oo. We are interested in the computation of c = E(/(X)) 
using a stratified sampling Monte-Carlo estimator. We suppose that (Aj)i<i<j 
is a partition of M. d into I strata such that pi — F[X £ Aj] is known explicitely 
for i£ {1, . . . , /}. Up to removing some strata, we assume from now on that pi 
is positive for all i € {1, . . . , I}. The stratified Monte-Carlo estimator of c (see 
[G04] p. 209-235 and the references therein for a presentation more detailed than 
the current introduction) is based on the equality E(f(X)) = X)j=iPi^(/(-^-t)) 
where Xi denotes a random variable distributed according to the conditional law 
of X given X £ A;. Indeed, when the variables Xi are simulable, it is possible 
to estimate each expectation in the right-hand-side using Ni i.i.d drawings of 
Xi. Let N — Yli=i Ni be the total number of drawings (in all the strata) and 
<Zi = Ni/N denote the proportion of drawings made in stratum i. 
Then c is defined by 

I Ni I qi N 

where for each i the Xf's, 1 < j < Ni, are distributed as Xi, and all the 
Xf's, for 1 < i < I, 1 < j < Ni are drawn independently. This stratified 
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sampling estimator can be implemented for instance when X is distributed 
according to the Normal law on M. d , Ai = {x € M. d : yi-\ < u'x < yi} where 
— oo = yo < 2/1 < . . . < yi-i < yi = +00 and ti 6 R d is such that |u| = 1. 
Indeed, then one has pi — N{y{) — N(yi^i) with N(.) denoting the cumulative 
distribution function of the one dimensional normal law and it is easy to simulate 
according to the conditional law of X given yi~\ < u'X < yi (see section [3721 for 
a numerical example in the context of options pricing). We have E(c) = c and 



2—1 4—1 l—l 1—1 

where af = Y(.f(X t )) = W(f(X)\X e AA for all 1 < i < I. 
During all the sequel we consider that 

(H) <Ji > for at least one index i. 



The brute force Monte Carlo estimator of Ef(X) is /P^); with the 



ir 01 i^j(^) is ^2.^ 

X-^'s i.i.d. drawings of X. Its variance is 
1 /i 




N 



E 2 (/(^)))- 5>E(/pQ)) 



For given strata the stratified estimator achieves variance reduction if the 
allocations Ni or equivalently the proportions q-i are properly chosen. For in- 
stance, for the so-called proportional allocation qi — Pi, Vi, the variance of the 
stratified estimator is equal to the previous lower bound of the variance of the 
brute force Monte Carlo estimator. For the choice 

ft = 7^7 =: ft , V 1 < 1 < I, 

the lower-bound in l|0.ip is attained. We speak of optimal allocation. We then 
have 

2 „2 



, , pier,- , 
i=l 

and no choice of the qi 's can achieve a smaller variance of c. 

In general when the conditional expectations E(f(X)\X S ^4^) = E(/(Xj)) 
are unknown, then so are the conditional variance of. Therefore optimal al- 
location of the drawings is not feasible at once. One can of course estimate 
the conditional variances and the optimal proportions by a first Monte Carlo 
algorithm and run a second Monte Carlo procedure with drawings independent 
from the first one to compute the stratified estimator corresponding to these 
estimated proportions. But, as suggested in [A04] in the different context of 
importance sampling methods, it is a pity not to use the drawings made in the 
first Monte Carlo procedure also for the final computation of the conditional 
expectations. 

Instead of running two successive Monte Carlo procedures, we can think to 
get a first estimation of the oVs, using the first drawings of the Xj's made to 
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compute the stratified estimator. We could then estimate the optimal alloca- 
tions before making further drawings allocated in the strata according to these 
estimated proportions. We can next get another estimation of the oVs, com- 
pute again the allocations and so on. Our goal is thus to design and study such 
an adaptive stratified estimator. The estimator is described in Section [TJ In 
particular, we propose a version of the algorithm such that at each step, the 
allocation of the new drawings in the strata is not simply proportional to the 
current estimation of the optimal proportions but chosen in order to minimize 
the variance of the stratified estimator at the end of the step. A Central Limit 
Theorem for this estimator is shown in Section [H The asymptotic variance is 
equal to the optimal variance of and our estimator is asymptotically optimal. In 
Section [3l we confirm the efficiency of our algorithm by numerical experiments. 
We first deal with a toy example before considering the pricing of an arithmetic 
average Asian option in the Black-Scholes model. 

Another stratified sampling algorithm in which the optimal proportions and 
the conditional expectations are estimated using the same drawings has been 
very recently proposed in [CGL07] for quantile estimation. More precisely, for 
a total number of drawings equal to N, the authors suggest to allocate the A 7 
with < 7 < 1 first ones proportionally to the probabilities of the strata and 
then use the estimation of the optimal proportions obtained from these first 
drawings to allocate the N — A 7 remaining ones. Their stratified estimator is 
also asymptotically normal with asymptotic variance equal to the optimal one. 
In practice, A is finite and it is better to take advantage of all the drawings 
and not only the A 7 first ones to modify adaptively the allocation between the 
strata. Our algorithm works in this spirit. 

1 The algorithm 

The construction of the adaptive stratified estimator relies on steps at which we 
estimate the conditional variances and compute the allocations. We denote by 
N k the total number of drawings made in all the strata up to the end of step k. 
By convention, we set A = 0. In order to be able to make one drawing in each 
stratum at each step we assume that N k — N k ~ 1 > I for all A: > 1. 

For all 1 < i < I we denote by N k the number of drawings in stratum i till 
the end of step k with convention Nf = 0. The increments M k = N k — A- 5 S 
are computed at the beginning of step k using the information contained in the 
A fc_1 first drawings. 

STEP k > 1. 

Computation of the empirical variances. 
If A; > 1, for all 1 < i < I compute 




If k = 1, set a° = 1 for 1 < % < I. 

Computation of the allocations M k = N k — N k 
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We make at least one drawing in each stratum. This ensures the convergence 
of the estimator and of the erf's (see the proof of Proposition 11.11 below) . 
That is to say we have, 



V 1 < i < I, Aff = 1 + ml, with ml € N, (1.1) 

and we now seek the rh k \. We have 53i=i ™i = N k — N k ~ x — /, and possibly 
m\ = for some indexes. 

We present two possible ways to compute the m k, s. 



a) We know that the optimal proportion of total drawings in stratum i for 
the stratified estimator is q* = „ I Pi(7i — , so we may want to choose the vector 

(m k , . . . , rhj) £ N 1 close to (m k , . . . , m k ) E R' + defined by 



^k— 1 

m k = , - (N k - N*' 1 - I) for 1< i < I. 

This can be achieved by setting 

m\ = \m\ + . . . + m k \ - \m\ + . . . + rn^iJ, 

with the convention that the second term is zero for i = 1. This systematic 
sampling procedure ensures that J2i=i = N k — N k ~ x — I and m k — 1 < 
m k < m k + 1 for all 1 < i < I, In case a k ~ — for all 1 < i < I, the above 
definition of m\ does not make sense and we set m k = Pi{N k — N k ~ 1 — I) 
for 1 < i < I before applying the systematic sampling procedure. Note that 
thanks to (H ) and the convergence of the of (see Proposition 11.11 below) , this 
asymptotically will never be the case. 

b) In case a k ~ 1 = for all 1 < i < I, we do as before. Otherwise, we may 
think to the expression of the variance of the stratified estimator with allocation 
Ni for all i, which is given by l|0.ip . and find (m k , . . . , m k ) G that minimizes 



under the constraint E*=i m i = N k — iV fc_1 — I. 

This can be done in the following manner (see in the Appendix Proposi- 
tion EE]): 

For the indexes i such that S^ -1 = 0, we set m k = 0. 

We denote I k the number of indexes such that of -1 > 0. We renumber 
the corresponding strata from 1 to J . We now find (m k , . . . , m k k ) € that 

Ji=\ N k-l + 1+m k 

by applying the three following points: 

i) Compute the quantities J-k-i and sort them in decreasing order. Denote 

by ^ fc _i the ordered quantities. 



minimizes ^2 i=1 Ji-i i k ; under the constraint m i = N ~ N — I, 
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ii) For i — 1, . . . , I compute the quantities 

N k - N*' 1 -1+^2 (N^ 1 + 1) 
j=»+i 



i k 

E pw^T 1 

Denote by i* the last i such that 

N k-1 , -. Z-^i u) 

E p«<4) 

J'=i+1 



iii) Then for i < i* set = and for i > i* , 



If this inequality is false for all i, then by convention i* = 0. 



N k-i_ I + ^ (JVg^ + l) 



E po^S 1 

This quantity is non-negative according to the proof of Proposition 14.11 

We then build (m k ,...,m k ) by reincluding the I — I k zero valued m k, s 
and using the initial indexation. Finally we deduce (rh k , . . . , mf ) G N 7 by the 
systematic sampling procedure described in a/ 

Drawings of the X, 's. Draw i.i.d. realizations of Xi in each stratum i 
and set N k = N^ 1 +M k . 

Computation of the estimator 
Compute 

i—l i j=i 

Square integrability of /(X) is not necessary in order to ensure that the 
estimator is strongly consistent. Indeed thanks to Ijl.ljl . we have N k — > oo as 
k — > oo and the strong law of large numbers ensures the following Proposition. 

Proposition 1.1 IfE\f(X)\ < +oo, then 

c* ► c a.s.. 

k — >oo 

If moreover, E(/ 2 (X)) < +oo, then a.s., 

i 

VI < i < I, a k ► Oi and pi$ k ► <r*. 

i=l 
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2 Rate of convergence 

In this section we prove the following result. 

Theorem 2.1 Assume (H), E(f 2 (X)) < +00 andk/N k -> as k 00. T/ien, 
Msm<? either procedure a) or procedure b) for the computation of allocations, one 
has 

k—>oc 

With Proposition ll.il one deduces that „ J v/ ^ T _ t (c fc — c) '" law > A/YO, 1), which 

enables the easy construction of confidence intervals. The theorem is a direct 
consequence of the two following propositions. 

Proposition 2.1 IfE(f 2 (X)) < +00 and 

Vl<*</, |^ 9 «a. S ., (2.1) 



then 



k — >oo 



Proposition 2.2 Under the assumptions of Theorem \2.1\ using either proce- 
dure a) or procedure b) for the computation of allocations, (|2.ip holds. 



We prove Proposition 12.11 and 12.21 in the following subsections. 
2.1 Proof of Proposition I2TT1 

The main tool of the proof of this proposition will be a CLT for martingales 
that we recall below. 

Theorem 2.2 (Central Limit Theorem) Lei(// n ) n6 N be a square- integrable 
(•Fn)neN -vector martingale. Suppose that for a deterministic sequence (~f n ) in- 
creasing to +00 we have, 

i) ' 

(PU r r 

ii) The Lindeberg condition is satisfied, i.e. for all e > 

1 ™ 

— 2J E H^ fe ~~ l ik - 1 W 2l {\\^-^k-i\\>e^u:}\ :F k-l 

ln fc=l 

Then 

Jin W-^^o r)< 
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As we can write 



i AT fc 



V Efii(/(^)-E/(x / )) J 



we could think to set Mfe := (e£(/(*D - E/W),--ES(/W 



E/(Aj))J and try to use Theorem l2.21 Indeed if we define the filtration (Gk)keN 

by C/fe = a(lj <N kXf, 1 < i < I, 1 < j), it can be shown that (/tfe) is a (Gk)- 
martingale. This is thanks to the fact that the N k, s are (J^-i-measurable. Then 
easy computations show that 

1 / \ j- /,JVf 2 N k 2 , 
— ( M ) fc = diag((^a 1 ,...,— ^ctj) 



5 V v iV fe 



N k 



where diag(v) denotes the diagonal matrix with vector v on the diagonal. 
Thanks to H2.ll) we thus have 



diag 



' * 2 



* 2 N 



and a use of Theorem l2.2l and Slutsky's theorem could lead to the desired result. 

The trouble is that Lindeberg's condition cannot be verified in this context, 
and we will not be able to apply Theorem l2.2l Indeed the quantity | — /ifc-i 1 1 2 
involves 7V fe — N k ~ 1 random variables of the type Xi and we cannot control it 
without making some growth assumption on N k — iV fe_1 . 

In order to handle the problem, we are going to introduce a microscopic 
scale. From the sequence of estimators (c k ) we will build a sequence (c n ) of 
estimators of c, such that c k — c N , and for which we will show a CLT using 
Theorem 12. 21 It will be possible because it involves a new martingale (/i n ) such 
that n n — is equal to a vector the only non zero coordinate of which is one 
random variable f{X(). Then the Lindeberg condition will be easily verified, 
but this time we will have to work a little more to check the bracket condition. 
As the sequence (c k ) is a subsequence of (c n ), Proposition 12. II will follow. This 
is done in the following way. 

Let n € N* . In the setting of the Algorithm of Section Q] let k € N such that 
N k - X < n < N k . Given the allocations (iY?)[ =1 , for < I < fc, we define for 
each 1 < i < I a quantity v™ with the inductive rule below. Each vf is the 
number of drawings in the i-th strata among the first n drawings and we have 
J2i=i — n. We then define 
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Rule for the i/f's 
For n = 0, if = 0, for all 1 < i < I. 



rk-1 



I N —N 

1. For k > set rf := ^.^-i for 1 < i < I. 



rk-l 



n-1 _ N k-1 



( k v, - ivr 

i n = argmax r, y 

i<i<i V n-N K 1 



If several i realize the maximum choose i n to be the one for which r k is 
the greatest. If there are still ex aequo's choose the greatest i. 

3. Set i/? = if" 1 + 1, and vf = v^ 1 if % £ i n . 



There is always an index i for which r\ — -* _ N k-i > 0, since 



i=l i=l 

Moreover, for the first n S {N^ 1 + 1, . . . , N k } such that f"" 1 = N k in the 

i strata, rf — v * 
This implies that 



i-th strata, rf - < and v? = v? = N k for n < n' < N 



v* = JV* Vl<i< J, VfceN, 



and as a consequence, 

c k = c N ". (2.2) 



Therefore Proposition [2TT] is an easy consequence of the following one. 
Proposition 2.3 Under the assumptions of Proposition [2J\ 

^(c"-c)^UAA(0 ) ^). 



In the proof of Proposition 12.31 to verify the bracket condition of Theorem 
2.2\ we will need the following result. 



Lemma 2.1 When (|2.ip holds, then 

VI < i < I, ► q* a.s. 

Tl n— >oo 

Proof. Let be 1 < i < /. During the sequel, for x € M*. or n € N* , the integer k 
is implicitely such that N k ~ 1 < x, n < N k . 
We notice that for any neN* 



it 



N k-i ^ n - N k-i> 
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and define for x € M5_, 



■N k -N k ~ 1 x '7V fc - r 



We will see that, as n tends to infinity, f(n) tends to q* and f(n) — ^- tends 
to zero. 

Computing the derivative of / on any interval (iV fe_1 , N k ] we find that this 
function is monotonic on it. Besides f(N k ~ 1 ) — fy-i an d f(N k ) = jh. So if 
jfjr tends to q* as k tends to infinity, we can conclude that 

f(n) >q*. (2.3) 

n — >oo 

As rf = M ' k _ M ' k - 1 we now write 



/(«) = 



r k 



n ' ' n \ n — N k 1 

We conclude the proof by checking that 

* - 7 ~ 1 v l ~ N t X k 1 
r i n -N k - l< n-N k - 1 < r < + n - A^-i ■ \ 1A > 

Indeed, this inequality implies 

< — - f{n) < -, 

n n n 

which combined with (|2.3j) gives the desired conclusion. We first show 

' ?vrfc— i (2-5) 



n - N^ 1 1 n - N^ 1 ' 

We distinguish two cases. Either vf — N^ 1 for all N k ~ 1 < n' < n, that is to 
say no drawing at all is made in stratum i between iV fc_1 and n, then l|2.5j) is 
trivially verified. 

Either some drawing is made between N k ~ x and n. Let us denote by n' the 
index of the last one, i.e. we have vf = = _1 + 1. As a drawing is made 

at n' we have u \,_ Nk l! < r* . 



We thus have, 



n'-l _ N k-1 n'-l _ N k-1 



< — - < r 

n-N k - 1 ~ n'-N^ 1 

and 

yf-Nf- 1 is?- 1 + 1- N k ~ x 



k 



and thus we have again l|2.5p . 

Using now the fact that 1 = J2i=i r i = J2i=i nJj^-i we get 



n - N^ 1 1 ^ \ 3 n- iV fe -! 



Using this and (f275]l we get ([2^4]) . □ 
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Proof of Proposition 12. 3l For n > N 1 , > 1 for all 1 < i < I and we can 

write 

i 



VE(c n - c) = 



(2.6) 



with 



( E£i(/(^')-E/(Xx)) \ 



and by Lemma |2~T 



Note that if <Ji = for a stratum i, then q* 

+oo which may cause some trouble in the convergence analysis. In compensa- 
tion, u, = means that /PQ) — E/(JQ) = a.s. Thus the component /u^ of /x n 
makes no contribution in c" — c. So we might rewrite l|2.6p with fi n a vector of 
size less than /, whose components correspond only to indexes i with <7j > 0. 
For the seek of simplicity we keep the size I and consider that er,; > for all 
1 < i < I. 

If we define T n '■= a {^-j<v^Xl, 1 < i < I, 1 < j), then (fi n ) n >o is obviously 
a -martingale. Indeed, for n <E N* let k € N* such that 7V fc_1 < n < N k . 
For 1 < i < I the variables A^ -1 and JV* are respectively and J-^k-i- 

measurable (Step fe > 1 in the Algorithm). As for each 1 < i < I the quantity 
vf depends on the N i ~ 's and the A^'s, it is ^jyfc-i-measurable. Thus [i n is 
J-"„-measurable and easy computations show that E[/i„ + i l^n] = /i„. 

We wish to use Theorem 12.21 with 7„ = n. We will denote by diag(eii) the 
I x I matrix having null coefficients except the i-ih diagonal term with value a^. 

We first verify the Lindeberg condition. We have, using the sequence (i n ) 
defined in the rule for the f"'s, 



I Mi 



W-iII 2i {||m,- w 



- 1 ||>e-v^}l« ?7 J-l] 



^Er=iE[i/(< !i )-iE/(x l; )i 2 i H i^.j 



< ^Er=l SU Pl<i</ E D/( X «') - E /(^)| 2l {|/(X 4 )-E/(X 1 )|>e^] 

= su Pl < i < / E[|/(X i ) -E/(JQ)l 2l {|/(xo-E/(xoi> e -vM]- 



As 



sup Efl/pQ-E/pQ)! 2 ! 



Ki<7 



the Lindeberg condition is proven. 

We now turn to the bracket condition. We have, 



(M>« = Ek=l E [(Mk-Mk-l)(Mk-Mk-l)'|^k-l] 
- ELi diag(E[ |/(*<) - E/(X lfc )| 2 ]) 

= ELidiag(<). 
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Thus, we have 

(m)« 



diag( (—erf, . . . , -L<jf) ) ► diag( (q^af, . . . , q}af) ) a.s. 



where we have used Lemma 1270 
Theorem 12.21 implies that 



JJn jnlaw^ ^ diag( {g * 2 , 2) ) A (2 ?) 

Using again Lemma l2"7Ll we have 

(Pi— ) ►(—,-••,—) a -s- (2.8 

Using finally Slutsky's theorem, IpES]) . (|27Fj) and JH]), we get, 

inlaw 



^(c"-c) ^UAT(0,a, 2 ). 



□ 



2.2 Proof of Proposition [2721 

Thanks to (iJ) and Proposition 11.11 there exists K € N s.t. for all fc > if 

we have J2i=iPi&i > 0- The proportions (p^ = „/' CTi „ fc )i are well defined 

for all > if and play an important role in both allocation rules a,) and &,). 
Proposition 11.11 implies convergence of p\ as k — > +oo. 



Lemma 2.2 Under the assumptions of Theorem \2.1\ 
VI < i < i, 0? ► a.s. 

A; — *oo 

Proof of Proposition 12.21 for allocation rule a). Let be 1 < i < I. We 

have = jc+ ^-l 1 mi ■ Using the fact that m\ — 1 < fh\ < m\ + 1 we can write 

TLM < < 2* , EtX 
_/yfc - _/yfc - jyfc ^yfc 

We will show that ^4 — 1 — ► qt, and, as jp; — > 0, will get the desired result. 
For k > K + 1 , we have 

EL r»j _ ££i mj EUi gKgj - N'- 1 ^ 
Nk j\[k j\[k 

= TlM N*-N* 1 ^ /(fc-JT) 1 A, 

AT* iV fc N k - N K ^ Pi N k k-K ^ Pi 

n=N K + l l=K 

where the sequence (p") defined by p" = p- for TV' -1 < n < N l converges to q* 
as ra tends to infinity. The Cesaro means which appear as factors in the second 
and third terms of the r.h.s. both converge a.s. to q* . One easily deduce that 
the first, second and third terms respectively converge to 0, q* and 0. □ 
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Proof of Proposition I2T2I for allocation rule b). There may be some strata 
of zero variance. We denote by /' (I' < I) the number of strata of non zero 
variance. 

For a stratum i of zero variance the only drawing made at each step will be 
the one forced by (fTTTJ) . Indeed of = for all k in this case. Thus N% = k for 
all the strata of zero variance and since — > 0, we get the desired result for 
them (note that of course q* = in this case) . 

We now work on the /' strata such that cr* > 0. We renumber these strata 
from 1 to J'. Let now K' be such that d k > for all k > K' , and all 1 < i < V . 
For k > K', the integer I k+1 at step k + 1 in procedure b) is equal to I'. 

Step 1. We will firstly show that 

Vk>K',Vl<i<I' J ^<U+ r s f ^ + —). (2.9) 

Let k > K 1 . At step k + 1 we denote by (.)& the ordered index in Point 
i) of procedure b) and by i* k the index i* in Point ii). We also set n k+1 = 
Nf + 1 + m k+1 . By Point iii), for i>i%, 

n k+1 m k+1 - 1 - Mk -i- 1 Mk+l Afk 7" _l s^ 1 ' { \jk 



L/-\ III// - \ 

(«)* ._ Wk 



N k )k + 1 ^ N k ^-N k -I + J2Ul + i( N L+V 



(2.10) 

Case 1: i* k = 0. Then, in addition to the drawing forced by (|l,lj) . there are 
some drawings at step k + 1 in stratum (l)fc, and consequently in all the strata. 
Thus ([2~T0l) leads to 



n k+i _ n k I M k+i 



p\ ^N k+1 -N k -I + I' + ^2 N jj ' VI < « < 

But N k = ^2j=i N kj rk(I—I') and, following the systematic sampling procedure, 
we have 

N k+1 < n k+1 + 1, VI < i < I'. (2.11) 

Thus, in this case, 

N k+1 , 1 

+ ^ vi<*<r. 

Case 2: i* k >0. If i < i%, N k +^ = N k )k + 1 and JSJJ holds. 
T£i>i%, then (|2~T0|) leads to 

"(g _ * ^ +1 -^-/ + E^ +1 K-) fc + i) 

Using (|2.1ip . it is enough to check that 

Nk+ i_ Nk _ I + E ^ +i{N k )k+1) 



Mk+1 V n k 



< 1 (2.12) 
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in order to deduce that (|2,9p also holds for i > i* k . 

AT,*.. +1 

If Nk+ \ \ < 1, then inequality (|2 . 1 2[) holds by the definition of it. 

.+i nK +i 

K N^pK, > 1 we have 

Nk+i p k — > 1) Vi < and thus 

if it 

j'=i i=i 
This inequality also writes 

N k -k(I-n + I>- £ (N k )k +l)>N k ^(l- £ 4j, 

and (f2TT2]) follows. 

f. Let 1 < i < I', We set n k := iV* — k (this the number of drawings 
in stratum i that have not been forced by Ql.lj) ). 
Using l|2.9p we have 

y h >K> N t +1 -(k + l) < N k + l-(k + l) ( k k 
and thus 

Let e > 0. Thanks to Lemma [221 there exists feo > if' s.t. for all k > ko, 

k 

Affc+i 



/a? - a?£ft < Q* + e. Thus 



n k 



Vfc>fco, ^r<-^V (tf+ e ). (2.13) 
By induction 

fi k fi k ° 

— k — "-0 — ft — k 

Indeed suppose jfa < ^jt V (q* + s) . H jft < q* + s then —fax < q* + e and 

using l|2.13p we get jft+T < Qi + £ - Otherwise n k — n^' and using (|2. 13f) we are 
done. 

n k ° n k 

But as -^f — > as fc — > oo we deduce that limsup fe jfz < q* + e. Since this 

is true for any e, and — * 0, we can conclude that limsup fe j-fe < q* . Now 
using the indexation on all the strata and the result for the strata with variance 
zero, we deduce that for 1 < i < 7, 



liminft — V — liminft ( 1 — Y\ =i — =7- ) 1 — Y^ =1 limsupi. — %■ 

N k \ N k J ~ Pk N k 



= i - £*=i q] = ql 



This concludes the proof. □ 
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3 Numerical examples and applications to option 
pricing 

3.1 A first simple example 

We compute c = EX where X — W(0, 1). 

Let 7=10. We choose the strata to be given by the a-quantiles y a of the 
normal law for a = i/I for 1 < i < I. That is to say Ai = (yi^±,yj.] for all 
1 < i < I, with the convention that yo — -co and y\ = +oo. 

In this setting we have Pi = 1/10 for all 1 < i < I. 

Let us denote by d(x) the density of the law A/"(0, 1). Thanks to the relation 
d'(x) = —xd{x) and using integration by parts, we can establish that, for all 
l<i<I, 

Efxiy^x^^ = - d(y<), 

and 

^(^X 2 ly 1 _ 1< x<y^ = yi^idiy^i) - y^d(y^)+pi, 

with the convention that yod(y ) = yid(yi) = 0. 

We can then compute the exact of = W(X\X 6 Ai)'s and the optimal 
standard deviation of the non-adaptive stratified estimator, 

a* = ^^pi(Ti ~ 0.1559335 

i=l 

We can also for example compute 

q* 5 = 0.04685 

This will give us benchmarks for our numerical tests. 

We will compute c k for k = 1, . . . , 4. We choose N 1 = 300, iV 2 = 1300, 
N 3 = 11300 and N 4 = 31300. 

First for one realization of the sequence (c fc )| =1 we plot the evolution of 
when we use procedure a) or b) for the computation of allocations. This is done 
on Figure [TJ 

We observe that the convergence of to q§ is faster with procedure b). 

Second, to estimate the variance of our adaptive stratified estimator, we 
do L = 10000 runs of all the procedure leading to the sequence {c k )\ =1 . For 
1 < k < 4 we compute, 

1=1 1=1 

with the ([c k ] 1 ) 1<1<L independent runs of the algorithm till step k. This esti- 
mates the variance of the stratified estimator at step k (N k total drawings have 
been used). To compare with cr* we compute the quantities 

s k = VN k v k 
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0.04^ , 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 , , 

5000 10000 15000 20000 25000 30000 35000 

Figure 1: Successive values of for 1 < k < 4, for procedure a) (the o-line) 
and procedure 6 ) (the *-line) , in function of N k . The horizontal line is at level 



(in other words we compare the standard deviation of our adaptive stratified 
estimator with N k total drawings with the one of the non-adaptive stratified 
estimator with optimal allocation, for the same number of total drawings). 

The values are ploted on Figure El We observe that the convergence to 
er* is slightly faster with procedure b). This corresponds to the fact that the 

convergence of the t^V's is faster with this later procedure (see Proposition [2TTJ) . 

We wish to compare the efficiency of our algorithm with the one of the non- 
adaptive stratified estimator with proportional allocation. Indeed this is the one 
we would use if we did not know the oVs. 

With the same strata as in the previous setting the stratified estimator with 
proportional allocation of c for a total number of drawings TV 4 = 31300 is 

10 3130 
i=l j = l 

We will compare it to c 4 that was computed in the example above. As we have 
seen in the Introduction, the variance of c is 

1 10 

i=i 

We do L = 10000 runs of c 4 and c. We get an estimation v A of the 
variance of c as previously. In a similar manner we get an approximation 

v = \ J2f=i([c} 1 ) 2 ~ (r Ef=i[c] ! ) of the variance of c. 

As Ei^i-Pi '? — {J2i=i Pi a i) 2 we know that we will have v > v 4 . But to 
compute c 4 we do some additional computations compared to a non adaptive 
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Figure 2: Successive values of s k for 1 < k < p, for procedure a) (the o-line) and 
procedure b) (the *-line), in function of N k (the abscissas axe). The horizontal 
line is at level er*. 



stratified estimator. This has a numerical cost. We thus use the L runs to 
compute the average computation times t 4 and i, respectively of c 4 and c. 

We have i 4 v 4 = 6.29 * 1CT 8 and tv = 7.57 * 1CT 8 . This means that in this 
toy example the numerical cost of our algorithm is not that much balanced by 
the achieved variance reduction. 

3.2 Applications to option pricing 

3.2.1 The setting 

We wish to compare our results with the ones of [GHS99]. 

We will work on the example of the arithmetic Asian option in the Black- 
Scholes model presented in this paper. We shortly present the setting. We have 
a single underlying asset, with price at time t denoted by St. Under the risk 
neutral measure P, the price (S t )t follows the stochastic differential equation, 

dS t = V S t dWt + rS t dt, 

with r the constant interest rate, V the constant asset's volatility, Wt a standard 
Wiener process, and So fixed. 

Let T > be the option's maturity and (t m — 1 ^") 1 < m < d the sequence 
of times when the value of the underlying asset is monitored to compute the 
average. The discounted payoff of the arithmetic Asian option with strike K is 
given by 

m— 1 
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Thus the price of the option is given by 

c = E 



e- rT 



d 

T, s '.- K 



d , 



But in this Black-Scholes setting we can exactly simulate the St m 's using the 
fact that St = So and 



St m = St m _, exp ([r - ^V 2 ]{t m - t m - X ) + Vy/t m - t~[X m ), VI < m < d, 

(3.1) 

where X 1 , . . . ,X d are independent standard normals. Thus, 

c = E[g(X)l D {X)}, 

with g some deterministic function, D — {x € R d : g(x) > 0}, and X a Re- 
valued random variable with law J\f(0,ld). 

In [GHS99] the authors discuss and link together two issues: importance 
sampling and stratified sampling. 

Their importance sampling technique consists in a change of mean of the 
gaussian vector X. Let us denote by h(x) the density of the law A/"(0, Id) and 
by h^(x) the density of the law Af(fi, Id) for any fi e R d . We have, 



] ^ K{ x )d x = n 9 (x + ,)^^- y 



The variance of g(X + fi) ^-d{X + fi) is given by 

( 9{X) J^)- C ) 2h ^ x)dx - 

Heuristically, this indicates that an effective choice of should give weight to 
points for which the product of the payoff and the density is large. In other 
words, if we define G{x) — log <?(x) we should look for that verifies, 

/i = argmax ( G(x) — ^x'x") (3.2) 
x£D \ 2 / 

The most significant part of the paper [GHS99| is aimed at giving an asymp- 
totical sense to this heuristic, using large deviations tools. 
The idea is then to sample g(X + ;u) £^x+l) ^-d{X + /i). 

Standard computations show that for any fi g R d , 

c = E[g(X + M ) e Vx-(i/2)MV lfl(x + _ 

Thus the problem is now to build a Monte Carlo estimator of c = E/ M (X), sam- 
pling f^X) with X ~ Af(0,I d ), and with f^x) = g(x + (i) e -' J -' x -( 1 / 2 ^^l D (x + 
/j.) , for the vector /i satisfying l|3.2p . 

The authors of [ L GHS99] then propose to use a stratified estimator of c = 
E/^pf). Indeed for u S K d with u'u — 1, and a < b real numbers, it is easy to 
sample according to the conditional law of X given u'X £ [a, b]. 
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It can be done in the following way (see Subsection 4.1 of [GHS99] for de- 
tails). We first sample Z = <& -1 (V) with <I> _1 the inverse of the cumulative 
normal distibution, and V = $(o) + U($(b) - $(a)), with U uniform on [0, 1]. 
Second we sample Y ~ J\f(0,ld) independent of Z. We then compute, 

X = uZ + Y -u(u'Y), 

which by contruction has the desired conditional law. 

Let be u £ M. d satisfy u'u = 1. With our notation the stratified estimator c in 
[GHS99J is built in the following way. They take I — 100. As in subsection 13.11 
we denote by y a the a-quantile of the law Af(0, 1). For all 1 < i < I, they take 
Ai = {x e W 1 : f/ j-i < u'x < yi}. That is to say Xi has the conditional law of 
X given < u'X < y» , for all 1 < i < I. As in this setting v! X ~ A/"(0, 1), 
they have Pi = l/I for all 1 < i < /. 

They then do proportional allocation, that is to say, JVj = p^iV for all 1 < 
i < I, where AT is the total number of drawings (in other words qt = pt). Then, 
the variance of their stratified estimator is 

1 1 

i=l 

According to the Introduction, that choice ensures variance reduction. 

The question of the choice of the projection direction u arises. The authors 
take u — p,/(p,'n), with the vector fi satisfying (|3,2j) that has been used for the 
importance sampling. They claim that this provides in practice a very efficient 
projection direction, for their stratified estimator with proportional allocation. 

A s ( J2i=i Pi a i) ^ J2i=iPi a i (i- e - proportional allocation is suboptimal), 
if u is a good projection direction for a stratified estimator with proportional 
allocation, it is a good direction for a stratified estimator with optimal alloca- 
tion. 

In the sequel we take the same direction u and the same strata as in [GHS99J, 
and discuss allocation. Indeed we may wish to do optimal allocation and take 
Qi = q* = Pil7i — . The trouble is the analytical computation of the quantities 

2-.jPj a j 

a* = N{f ll {X)\u'X e{ Vi ^, yj .\), 

is not tractable, at least when / M is not linear. As the p^s are known, this 
is exactly the kind of situation where our adaptive stratified estimator can be 
useful. 

3.2.2 The results 

In all the tests we have taken S = 50, V = 0.1, r = 0.05 and T = 1.0. The 
total number of drawings is A^ = 1000000. 

We call GHS the procedure used in [GHS99| . that is importance sampling 
plus stratified sampling with proportional allocation. We call SSAA our pro- 
cedure, that is the same importance sampling plus stratified sampling with 
adaptive allocation. 
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d 


K 


Price 


variance SSAA 


ratio GHS/SSAA 


16 


45 


6.05 


2.37 x lfr" 


2.04 




50 


1.91 


1.00 x 10~ 7 


35 




55 


0.20 


5.33 x 10~ 9 


39.36 


64 


45 


6.00 


3.36 x 10~ 9 


3.34 




50 


1.84 


9.00 x 10~ 10 


1.60 




55 


0.17 


6.40 x 10" 9 


61 



Table 1: Results for a call option with Sq = 50, V = 0.1, r = 0.05, T = 1.0 and 
N = 1000000 (and 1= 100). 



More precisely in the procedure SSAA we choose N 1 = 100000, N 2 = 
400000, N 3 = 500000 and compute our adaptive stratified estimator c 3 of 
c = E/(A), with the same strata as in GHS. We have used procedure a) for the 
computation of allocations. We denote by c the GHS estimator of c. 

We call «variance GHS» or «variance SSAA» the quantity a, which is an 
estimation of the variance of c or c . More precisely for GHS, 

1 1 

i=l 

where for each 1 < i < I, 

F 3=1 F 3 = 1 

and for SSAA 

i=l 

where for each 1 < i < I, 

N 3 Nf 

1 3 = 1 1 3 = 1 

Tables [T] and [2] show the results respectively for a call option and a put 
option. We call «ratio GHS/SSAA» the variance GHS divided by the variance 
SSAA. In general the improvement is much better for a put option. Indeed the 
variance is often divided by 100 in this case. 



A further analysis can explain these results. We plot on Figure [3] and [4] 
the values of the <7, 's and the estimated values of the conditional expectations 
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d 


K 


Price 


variance SSAA 


ratio GHS/SSAA 


16 


45 


0.013 


7.29 x 10~ iU 


107 




50 


0.63 


7.29 x 10~ 8 


79 




55 


3.74 


2.50 x 10~ 5 


249 


64 


45 


0.011 


5.76 x 10~ 10 


95 




50 


0.62 


5.61 x 10~ 8 


64 




55 


3.69 


1.85 x 10" 5 


58 



Table 2: Results for a put option with Sq — 50, V = 0.1, r = 0.05, T = 1.0 and 
N = 1000000 (and 1 = 100). 




Figure 3: On the left: value of tr, in function of the stratum index i in the case 
of a call option. On the right: estimated value of E/^JQ). (Parameters are the 
same as in Tables [H with d = 64 and K = 45) . 




Figure 4: On the left: value of di in function of the stratum index i in the case 
of a put option. On the right: estimated value of E/ M (Xj). (Same parameters 
than in Figure [3]). 
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E/ /J (Xi)'s, for a call and a put option, with d = 64 and K = 45, a case for which 
the ratio GHS/SSAA is 3.34 in the call case and 95 in the put case. 

We observe that in the case of the put option the estimated conditional 
variance of about 90% of the strata is zero, unlike in the case of the call option. 
These estimated conditional variances are zero, because in the corresponding 
strata the estimated conditional expectations are constant with value zero. 

But these strata are of non zero probability (remember that in this setting 
Pi = 0.01, for all 1 < i < 100). Thus the GHS procedure with proportional 
allocation will invest drawings in these strata, resulting in a loss of accuracy, 
while in our SSAA procedure most of the drawings are made in the strata of 
non zero estimated variance. 

One can wonder if the expectation in the strata of zero observed expectation 
is really zero, or if it is just a numerical effect. We define the deterministic 
function s : R d -> M by 



m — 1 



P =i 



y x = (x 1 ,..., x d y e 



With the previous notations, in the put option case, we have f^(Xi) = a.s., 
and thus E/ AI (A" i ) = 0, if s(Xi + a) > K a.s. (note that i denotes here the 
stratum index and not the component of the random vector Xj). 

Thus the problem is to study, in function of z £ M, the deterministic values 
of s(x + n) for x £ R d satisfying u'x = z. The following facts can be shown. 
Whatever the value of u or z the quantity s(x + fi) has no upper bound. Thus in 
the call option case no conditional expectation Ef fJ ,(X i ) will be zero. To study 
the problem of the lower bound we denote by M the matrix of size dxd given by 



M 



(I 
1 1 

\l ... 





V 



with inverse M 1 



/l 

-1 1 

\0 ... 



... 0\ 



'•. 



and by 1 the <i-sized vector (1, . . . , 1)'. If we use the change of variable 

/ V 2 T Ft \ 

y = M([r--]-l + V^-(x + ,)), 



we can see that minimizing s(x + fi) for x e R d satisfying u'x — z is equivalent 



to minimizing ^ S m =i ex P(y m ) f° r 2/ G M d satisfying 



w y = v, 



(3.3) 



where, 
and 



to = (M )u 



v = v! ([r - + Vsj ±(x + /i)) = Vsj^(z + u'u) + (r-^-)J2 



V 2 



m—1 



21 




Figure 5: Value of the component u m of u £ M. d in function of to. 



If all the components of w are stricly positive the lower bound of s(x + fx) 
under the constraint u'x — z is 



* So 
s = — x exp 
a 



J2m=l W m log W r , 

Ed 
m=l Wm 



(3.4) 



If all the components of w are stricly negative we get the same kind of result 
by a change of sign. Otherwise the lower bound is zero: it is possible to let the 
y m, s tend to — oo with l|3.3p satisfied. 

In the numerical example that we are analysing the direction vector u is the 
same in the call or put option cases, and its components are stricly positive and 
decreasing with the index (see Figured]). Thus the components of w are strictly 
positive and the lower bound is given by s* defined by l|3.4p . With z taking 
values in the 90 last strata we have s* > 45. Thus the conditional expectations 
E/^pTj) are truly zero in these strata. 

We can then wonder if it is worth stratifying the part of the real line corre- 
sponding to these strata, in other words stratifying Mr and not only D. Maybe 
stratifying D and making proportional allocation will provide a sufficient vari- 
ance reduction. But this would require a first analysis, while our SSAA proce- 
dure avoids automatically to make a large number of drawings in D c . 

To conclude on the efficiency of our algorithm in this example let us notice 
that the computation times of the GHS and SSAA procedures are nearly the 
same (less than 1% additional time for the SSAA procedure). Indeed, unlike 
in the toy example of Subsection I3.1[ the computation time of the allocation 
of the drawings in the strata is almost negligible in comparison to the other 
calculations (drawings etc.). 



4 Appendix 

We justify the use of procedure b) in the following proposition. 
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Proposition 4.1 When o k ^ x > for some 1 < i < I, by computing at Step k 
the m k 's with the procedure b) described in Section^ we find (m k , . . . , m k ) £ 
that minimizes 

under the constraint Ei=i m i = — N k ~ 1 — I. 

Proof. First note that if of -1 — for some index i it is clear that we have to set 
m\ = and to rewrite the minimization problem for the indexes corresponding 
to erf -1 > 0. This corresponds to the very beginning of procedure b). 

For the seek of simplicity, and without loss of generality, we consider in the 
sequel that o^ -1 > for all 1 < i < 7, and thus work with the indexation 
{1,...,/}. 

We will note M = N k — N k ~ 1 - 7, and, for all 1 < i < I, rii = N^ 1 + 1, 
on = pid k_1 , and m ; = m k . We thus seek (mi, . . . , to/) G IR^_ that minimizes 

E t i — ~r — under the constraint V\. , m, = M . 

Step 1: Lagrangian computations. We write the Lagrangian corresponding 
to our minimization problem, for all (m, A) £ x R: 



£(m,A)=£ 



or 



A(^TOi - M) = ^/i^rni, A) - AM. 



i=l 



with h 



for all i. 

We first minimize £(m, A) with respect to m for a fixed A. 
For any A £ R let us denote m(A) := argmin mgR i £(m, A). 
Minimizing £(m, A) with respect to m is equivalent to minimizing hifrrii, A) 
with respect to m., for all i. 

The derivative of each hi{., A) has the same sign as — of + A(rti + x) 2 . 

If A < we have m(A) = (oo, . . . oo). 

If A > there are two cases to consider for each fu: 



•2 

either A > ^ and rrii(X) = 0, 



or A < 2| and to;(A) = yafA 
To sum up we have 



£(m(A),A) = < 





Ei=i 



1 „2 ^ 



1 „2 (2a,vA — ro,A) 



(4.1) 

if A < 0, 
if A = 0, 

MX if A > 0. 



We now look for A* that maximizes £(m(A), A). For all A £ (0, oo) we have, 
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d\C(m 



1 — 1 n .i 



This function is continuous on (0,+cc), equal to — M for A > max^, de- 

ct 2 1 

creasing on (0,max^ -4] and tends to +oo as A tends to 0. We deduce that 

a 2 

A t } £(m(A), A) reaches its unique maximum at some A* E (0, max^ -§■). 



Ifd A £(m(^),^ 



< for all 1 < i < L we set i* = 0. 



Otherwise we sort in increasing order the af/nf's, index with (i) the ordered 
quantities, and note i* the integer such that 



L >,2 ^,2 ^,2 



M^P),^F)>0 and d x c(m(^),^^)<0. (4.3) 

2 2 2 

Then A* belongs to [^p-, or (0, %p-) if i* = 0. But on this interval 

n (i*) "(1) 



d x £(m(X),X)= J2 (^7=- n U))- M - 
j-i'+i v 



As d\C(m(X*), A*) = we have, 



M+ Y, Hi) 

i=i*+l 



Clearly, if i* 7^ 0, A* > ^ is equivalent to i < z*. If i* = then A* < 



"(<) ,l t« 

for all 1 < i < J. Thus, according to l|4.ip . we have m^(\*) = if i < i* , and 

if i > i*, 

1 

M+ J2 n U) 

j—i* +1 

= a (i)- J n (i)- ( 4 - 4 ) 

We have thus found (m(A*), A*) that satisfies 

£(m(A*),A*) = max min £(m, A), 

which implies that £(m(A*),A*) < £(m, A*) for all m € R+. Besides (|4.4p 

implies ELi TO *( A *) = M and £(m(A*),A*) = £(m(A*),A) for all A e M. 
Therefore (m(A*), A*) is a saddle point of the Lagrangian and m(A*) solves the 
constrained minimization problem. 
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Step 2. We now look for a criterion to find the index i* satifying l|4.3p . If 
i* =^ 0, we have the following equivalences using the concavity of A 1— > £(m(A), A) 
and {4~2"1) 




M + E U (3) 



> 



j=i+l 



I 




3=i+l 



In the same manner 



M + £ n U) 



i* = o 




7 



j=i+l 



, VI < i < I. 



j=i+i 



The proof of Proposition 14.11 in then completed: in Points i) and ii) of 
procedure b) we find the index i* mentionned in Step 1, using the criterion of 
Step 2. In Point iii) we compute the solution of the optimization problem using 
the results of Step 1. 
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